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The quasispecies model of biological evolution for asexual organisms such as bacteria 
and viruses has attracted considerable attention of biological physicists. Many variants of 
the model have been proposed and subsequently solved using the methods of statistical 
physics. In this paper I will put forward important but largely overlooked relations between 
localization theory, random matrices, and the quasispecies model. These relations will help 
me to study the dynamics of this model. In particular, I will show that the distribution of 
times between evolutionary jumps in the genotype space follows a power law, in agreement 
with recent findings in the shell model - a simplified version of the quasispecies model. 

PACS numbers: 87.23.Kg, 87.10.-e, 02.50.-r, 05.70.Fh 

1. Introduction 

The theory of biological evolution is a pillar of modern biology and has been inspiring 
scientists since the time of Darwin. Evolution operates by two primary processes: natural 
selection, in which best adapted organisms outcompete less adapted ones, and variability in 
different traits of individual organisms, which are passed down from generation to generation. 
Although natural selection was supported by rather strong experimental evidence already 
150 years ago, it was only in 1950's when the discovery of the role of DNA in heredity and, 
subsequently, the explanation of its molecular structure provided a molecular basis for genetic 
variability. Today we know that this variability is related to changes to the genetic material 
stored in DNA, either by DNA exchange in sexual reproduction or microbial conjugation, or 
by mutations - random changes in the sequence of nucleotides which form the DNA chain. 

All organisms capable of self-reproduction which exist today are complicated systems of 
many coupled chemical reactions, usually taking place in isolated compartments - cells - 
or even smaller regions within cells. Is evolution a phenomenon which started to operate 
after living organisms had emerged a few billion of years ago, or was it preceded by a similar 
process acting on molecules floating freely in the oceans? In an attempt to answer that 
question, Eigen and Schuster conceived a theoretical model pj explaining how selection and 
mutation could work already on the level of single chemical molecules. In their model, 
macromolecules such as DNA or RNA, which are linear polymers composed of nucleotides, 
were subjected to error-prone replication. If errors (mutations) were rare, the molecule with 
the highest replication rate soon dominated the population. For increasing mutation rate, 
however, the fittest molecule was surrounded by an expanding cloud of mutants in the space 
of all possible sequences. Eigen and Schuster called this cloud "quasispecies", by analogy to 
the concept of a biological species, which consists of closely related genotypes. 

The model predicts that if the mutation rate increases beyond some critical value - 
the error threshold - the quasispecies becomes "delocalized". This means that the fittest 
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molecule corresponding to some particular sequence of nucleotides is lost and all possible 
sequences start to appear. If these other sequences are much less fit or even incapable of 
reproduction (lethal mutations), the population will inevitably die out. This is called the 
error catastrophe. The error threshold is predicted to decrease with increasing length of the 
sequence, suggesting that for a given mutation rate, the amount of genetic information stored 
in a self-replicating molecule is restricted. On the other hand, higher mutation rate means 
improved adaptability to changing conditions. Indeed, it has been found experimentally [2], 
that the evolution of some viruses such as HIV operates very close to the error threshold. 

The molecular quasispecies theory has attracted considerable attention not only from 
biologists but also mathematicians and physicists. In particular, the quasispecies model 
has been studied by mapping it onto Ising spin chains [21 SI [5l [6], directed polymers [7], 
or, more recently, Anderson localization [HIE], and has been solved exactly in some special 
cases |5]lini[II]- The purpose of this work is to re-examine the quasispecies model, or, more 
precisely, its version with parallel mutation and selection [5], and to show an analogy between 
this model and random matrices which appear in the theory of Anderson localization. I will 
first define the quasispecies model and discuss its several versions which can be found in the 
literature. Next, I will show how some questions fundamental to the quasispecies theory can 
be rephrased using the language of random matrix theory (RMT). Finally, I will show how 
to answer these questions using some simple concepts borrowed from RMT. 



The quasispecies model \T\ describes biological evolution of simple organisms which re- 
produce asexually in a chemostat - a bioreactor with constant supply of fresh nutrients and 
removal of liquid culture and waste products, so that the culture volume is kept constant. 
Every organism has a "genotype" which is a sequence of length L composed of symbols taken 
from some finite alphabet. The symbols could correspond to four different nucleotides which 
are the building blocks of DNA, but one usually considers binary sequences composed of only 
two symbols and 1. This assumption simplifies calculations but it has no qualitative effect 
on any properties of the model I will mention in this manuscript. The number of all possible 
genotypes is = 2^. Each genotype can be labelled by an integer number i = 1,. . . ,N, 
and the binary representation of i — 1 gives the corresponding binary sequence. 

The genotypes reproduce by replication. However, the copy procedure is not error-free: 
each symbol has a finite probability 7 of being substituted by another (randomly chosen) 
symbol in the process of replication. Assuming that errors are made independently at any 
position in the sequence, the matrix Qji which describes the probability of mutation from 
genotype i to genotype j reads 



where d{i,j) is the Hamming distance between the genotypes i.e., d{i,j) is the number 
of positions at which the corresponding sequences are different. For j = i, Qa = (1 — 7)^ 
gives the probability of replicating without errors. By definition, Qji is a symmetric, square, 
doubly stochastic matrix: Qji = Qij = 1. 

If we denote by (j)i the specific growth rate of genotype i, we can describe the model by 
the following reactions 



2. Quasispecies model 



(1) 



genotype i 




genotype i + genotype i 
genotype i + genotype j 



(with probability Qa) 
(with probability Qji) 



(2) 



in which (j)i above the arrow means that the reaction constant is (j)i. Suppose now that the 
population is very large so that we can neglect stochastic fiuctuations of the numbers of 
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genotypes. Then, the time evolution of the abundances (number densities) ni, . . . ,ni\i of 
genotypes i = 1, . . . , is modelled by the set of differential equations: 

d ^ 

— nj(t) = ^^QijcjijUjit) -ni{t)J{t), (3) 

where J(t) is the rate at which organisms (molecules) are washed out from the system. 
The term J{t) forces the system to evolve towards the steady state, otherwise the growth 
would always be exponential. In this paper, J(t) is assumed to be proportional to the 
overall concentration Yli^ii^)- This causes the net growth rates to become negative if the 
population is too dense, as if the organisms competed for limited resources. 

Since J{t) depends on ni{t), the quasispecies equation Q is non- linear. However, a 
simple change of variables: 

Xi{t) = mit) exp (^J^ J{t')dt'^ (4) 

reduces Eq. ([3]) to a linear equation, 

d ^ 

— Xi(t) = ^ Qij(t)jXj{t). (5) 
i=i 

As we are usually interested only in the ratios of different Tij's (relative concentrations of 
genotypes), we can use Xi instead of rii to describe the state of the system. The above 
equation can be rewritten as 

^x(t) = Wx{t), (6) 

where the matrix Wij = Qij4>j- Let Ai > A2 > • • • > Aat be the set of eigenvalues of W, and 
{ipi} denote the corresponding eigenvectors: 

W^i = Xitfi. (7) 
Then, Eq. ^ has the following solution 

x{t) = e*^f (0) = J2 (V^* • ^(0)) (8) 

i 

which for large times reduces to x{t 00) oc ■i/'i- Therefore, the steady state of Eq. ([3]) is 
proportional to the eigenvector -01 to the largest eigenvalue Ai of the matrix W . This gives 
the steady-state abundances 

n* = n{t ^ 00) = iPi. (9) 

The physical properties of the formal solution from Eq. (|8]) and Eq. ([9]) depend on the 
choice of the growth rates {(pi}, which specify "fitnesses" of different genotypes, i.e., how well 
they are adapted to the environment. The graph of possible mutations plus the fitnesses is 
usually referred to as the fitness landscape. This metaphor is based on viewing fitness peaks 
as mountains of different heights, with the population climbing generally uphill in the course 
of evolution and moving from lower to higher peaks, until the highest peak (global fitness 
maximum) is reached. Although frequently used in population biology, the concept of fitness 
has had an important drawback: until recently little information was available on real fitness 
landscapes because experimental evaluation of the fitness for large numbers of genotypes is 
very difficult. Lacking experimental data, many models for the fitness landscape have been 
considered, without a priori knowledge of which one is correct. Some of the most popular 
choices are listed below: 
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1. single-peak landscape [lUJ : the fitness <j)i of a single genotype is taken to be maximal, 
and (j)2 = ' ' ' = <pN are assumed to be smaller than (pi. This corresponds to a situation 
in which there is one best-fit genotype (master sequence) and all mutants are less fit. 

2. multiplicative landscape [llj: (pi is maximal, and (pi = (pi{l — s)'^^^'^^ where s is some 
positive constant and d{l,i) is the Hamming distance between the sequences 1 and i. 
In this model, each single-symbol mutation lowers the fitness. 

3. "holey" landscape: the fitness of each genotype is either large (fit genotypes) or small 
(unfit genotypes). Thus, in the fitness landscape, there are holes of unfit genotypes 
surrounding islands of equally fit genotypes. The fitnesses can be either correlated |12) 
or uncorrelated [13j. 

4. rugged fitness landscape }14l I15| : the fitnesses are drawn as independent, identically 
distributed random numbers from some continuous distribution p{(p), the same for all 
genotypes. 

Most analytical results has been obtained for the single-peaked landscape (1), whereas the 
most realistic one is probably the rugged landscape (4). However, what all these landscapes 
have in common is the emergence of localized "quasispecies" and (for some of them) the 
existence of the error threshold. 

As already mentioned, the quasispecies is a set of closely related sequences which occupy 
a finite area in the genotype space. The name "quasispecies" comes from an apparent simi- 
larity to a real- world situation in which genotypes corresponding to organisms of the same 
species form a "cloud" of mutants in the genotype space around the best-fit genotype. The 
quasispecies exists for any mutation rate 7 smaller than some critical 7c, because in this 
case the steady state solution n* oc ipi is localized around (usually) the maximal fitness, see 
Fig. [TJ Above 7c, the steady state becomes delocalized and spreads over the whole genotype 
space. The critical 7c can be easily calculated [2] for the single-peak fitness landscape: 

-'-(If)'" 

where (pi, (p2 are fitnesses of the best-fit genotype and less-fit genotypes, respectively. Above 
7c, the best-fit genotype (the master sequence) vanishes from the population. This critical 
7c is called the error threshold and the transition from localized to delocalized quasispecies 
is known as the error catastrophe. For increasing L and (p2/(pi kept constant, the critical 
mutation rate scales as 7c ~ ^/L, thus longer sequences have smaller error thresholds. 

3. Quasispecies model with parallel mutation and selection 

The quasispecies model has an even simpler counterpart — para-mu-se (parallel mutation 
and selection) model f5|. A key feature of this model in comparison to Eq. ([2]) is that growth 
and mutation are decoupled: 

genotype i > 2 genotype i, (11) 

genotype i > genotype i -|- genotype j. (12) 

Here A is an adjacency matrix of a graph of possible mutations and 7 is the mutation rate. 
It is usually assumed that the matrix A is symmetric (forward and reverse mutations have 
the same probability) and that Aij = 1 if the Hamming distance d{i,j) = 1. Therefore, 
mutations can change at most one symbol per replication. Then, A is the adjacency matrix 
of the hypercube graph, see Fig. [2j 
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Fig. 1. Abundances n* of genotypes in the steady state in the single-peak model, for L = 7 (which 
corresponds to = 128 genotypes), 0i = 10, 02 = • ■ • = (j^N = 1, thus the fittest sequence being at 
i = 1, and for two mutation rates 7 = 0.01 (left, well below the error threshold 7c(i = 7) w 0.28) and 
7 — 0.4 (middle, above the threshold). Genotypes which correspond to the same Hamming distance 
from the master sequence i = 1 (the same "error class") have the same abundance. Right: plot of Ui 
for different error classes (lines from top to bottom) as a function of 7, for L = 30. The transition, 
which is clearly visible at 7c(i — 30) ~ 0.074, becomes sharper for increasing L. 
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Fig. 2. Examples of mutation graphs for the para-mu-se model, for L = 2, 3, 4. 



The model is known to have the error threshold for some fitness landscapes [6]^ and for 
small 7 (for which Qij ~ 7^jj in Eq. ([1])) it behaves very much the same as the quasispecies 
model from Section 2. However, the para-mu-se version of the quasispecies model is generally 
simpler from a mathematical point of view. From now on, I will focus on this model, although 
general conclusions of this work remain valid also for the original quasispecies model. 

In the limit of infinite populations, the para-mu-se model is described by the following 
set of equations (cf. Eq. ([3])): 

d ^ 

—ni{t) = ni{t){(t>i - J(t)) +7^^ij(nj(t) -ni(t)). (13) 

Applying the same transformation ^ as before, the above set is reduced to the linear form: 

^x(t) = Wx{t), (14) 

where the matrix W is now defined as Wij = 5ij(j)i and Ajj = Aij — 6ij Aik is the 

graph Laplacian. If Ai > A2 > • • • > Aat and {ipi} are again the eigenvalues and eigenvectors 
of W, the solution of Eq. (|14p is given by the same Eq. ^ as for the quasispecies model, 
but with the new matrix W. 



^ For the single-peak landscape, eigenvector ipi is always localized, but the net growth rate of the master 
sequence can be negative above some critical 7c, which may be thought as a sort of the error catastrophe. 
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4. Localization, random matrices and quasispecies 

The quasispecies or the para-mu-se model can be solved exactly in some special cases. 
The purpose of this article is to show that in the case of random, rugged fitness landscape 
which is much more difficult to study, there is an interesting analogy between the para-mu- 
se model, Anderson localization and RMT. Although the connection between localization 
theory and some evolutionary models in the continuous genotype space was made already 
more than 25 years ago [16j, the analogy (however obvious it may seem) between the para- 
mu-se model and RMT is not widely appreciated. 

In what follows I will assume that the fitnesses {(pi} are random numbers drawn from 
some distribution p{(f)). The quasispecies matrix W, 



W 



h ... 

Ct>2 ... 



... 



9 AT 



+ 7A = D + 7A, (15) 



is the sum of a diagonal, random matrix D = diag((/>i, . . . , (j)N), and a Laplacian matrix 7A. 
The eigenproblem of W: 

'^{(pi^ij + = AV'j (16) 

can be translated to the following Schrodinger equation: 

- ^ Aij-^j + Viil:i = Ei/Ji, (17) 
j 

where Vi = —(pi/^ and E = —A/7. This is precisely the Anderson model of localization 
on arbitrary lattices (see, e.g., Ref. |17]) with a random potential Vi. High fitness values 
correspond to low potential values, and the ground state corresponds to the steady state 
n* oc ipi (the quasispecies). 

The matrix W has random elements only on the diagonal, which makes it different from 
what is usually considered to be a random matrix — a matrix with a finite fraction of elements 
being (possibly correlated) random numbers. Such "dense" random matrices, which form the 
core of random matrix theory, appear in many problems in physics |18| . telecommunication 
and information theory [19j, and quantitative finance |20j. However, matrices in which only 
diagonal elements (or elements in a narrow band) are random, are also quite abundant in 
physics, in particular in quantum chaos [21J [22l |23] and Anderson localization problems 
|24^ [25] [26l [271 I28| . The matrix W with random fitness values defined above belongs to 
the same class of sparse random matrices. There is, however, one important difference: 
typical systems studied in the framework of localization theory are usually low-dimensional, 
except for Bethe lattices [171 124) . The reason is that low-dimensional systems can model 
real physical situations like transport properties of disordered solids [29\. In addition, many 
analytical results have been obtained for Id or Bethe lattices due to their special, simple 
structure. In contrast, the quasispecies problem is multi-dimensional, since the Laplacian 
A is defined on the hypercube graph, and the matrix W, albeit sparse, has a non-trivial 
structure. Although this can generally make analytical calculations very hard, it will not be 
relevant to the problems studied in this work. 

Random matrix theory deals primarily with eigenvalues and eigenvectors of matrices. 
Table 1 lists some of typical quantities calculated in RMT. It turns out that some of them 
are directly related to quantities relevant to the quasispecies theory. The first example I shall 
consider are spectral properties of the matrix W. A simple reasoning shows that differences 
between nearest eigenvalues of W determine typical timescales in the model. In the limit of 
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small mutation rate 7 ~ 0, is almost diagonal and the eigenvalues {Aj} are equal to the 
fitnesses For simplicity, let the fitnesses decrease with i, so that the ordered eigenvalues 

are = cp^. All eigenvectors are then trivially localized: ipij = 6ij, and the best adapted 
genotype corresponds to the eigenvector ipi, the second best adapted one to the eigenvector 
V'2i and so on. If the population is initially localized at the least adapted genotype A^, then 
scalar products • x(0) decay exponentially fast with increasing Hamming distance d{N, i). 
Writing the solution of Eq. (I14|) as 



x{t) = e*^f (0) = e*^' (^i^(O)) A 

i 

(V^if (0)) V^i + e-*(^i-^2) [(^"*2x(0)) ^2 + e-*(^2-A3) J(^^"*3f(o)) ^3 + . . . 



reveals that the contribution of the eigenvectors ipN^ipN-i, • • • to x{t) first increase (in 
this order) and then decay with rates Aat-i — Atv, • • • , A2 — A3, Ai — A2- The rates Aj — Aj+i 
gives different timescales in the system. In particular. 



1 



Ai — A2 



(18) 



is the characteristic time to reach the steady state n* oc ipi. Other differences Aj — Aj+i 
correspond to characteristic times 

related to "jumps" between locally adapted quasispecies (Fig. [3l left). If the distribution of 
differences Aj — Aj+i is known, one can calculate the average time between these events, and 
hence estimate the speed of evolution. However, the probability distribution of differences 
Sn{s) with s = \n — \n+i is just the nearest-neighbour spacing distribution, which is very 
commonly used in RMT to study short-range fluctuations in the spectrum (Fig. [3l right). 
Therefore, the problem of time evolution in the quasispecies theory is equivalent to the 
problem of flnding 5„(s) for a particular ensemble of random matrices W. The rest of 
the paper is devoted to calculating this quantity and interpreting it from a quasispecies 
perspective. 



Table 1. Correspondence between quantities of interest in RMT and in the quasispecies theory 



RMT Quasispecies 

level spacing o statistics of jumps in fltness space 

distribution of maximal eigenvalue -f-)- steady-state total abundance 

localization of eigenvectors -f-)- error threshold 

participation ratio of eigenvectors f-)- genetic diversity 



Before I proceed to calculations of Sn{s), I will briefly mention other similarities between 
the quasispecies theory and RMT. For example, the formula for the steady-state abundances 
n* from Eq. ([9]) shows that the maximal eigenvalue Ai plays the role of the total abundance 
of all possible genotypes. The distribution of the maximal eigenvalue is frequently studied 
in the framework of RMT. 

Finally, the statistics of eigenvectors {tpi} of some random matrices turns out to be 
very important for localization problems. In particular, participation ratio of eigenvectors is 
studied as a measure of localization. However, the localization transition in matrix models 
corresponds to the error catastrophe in the quasispecies model, which shows yet another in- 
teresting connection between RMT and the quasispecies theory. Table [1] provides a summary 
of all analogies mentioned in this work. 
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time 



Fig. 3. Left: a schematic view of jumps in the fitness space. Red line shows the trajectory of 
the most populated genotype. Typical timescale t is inversely proportional to the separation s of 
eigenvalues of W. Right: two level spacing distributions - exponential and Wigner's surmise - 
frequently encountered in RMT. 



5. Level spacing distribution 

In this section I will discuss the level spacing distribution 5„(,s), and its average over all 
nearest-neighbour pairs of eigenvalues 

N-l 

^(^) = E ^-(^) (20) 

n=l 

for the matrix W in the para-mu-se model. I will assume the uniform distribution of fitness: 
p{4') = 1 in the range 0. . . 1. This ensures that there is always a maximal and a minimal 
fitness, which is biologically relevant (the growth rate cannot be arbitrarily large), and that 
for large = 2^, max{</>i, . . . , c/iat} — )■ 1 is bounded from abov^. At last, the uniform 
distribution of 0j allows one to draw yet another link to Anderson localization, in which site 
potentials are also uniformly distributed |31) . 

As explained above, dynamical properties of the quasispecies or the para-mu-se model 
can be inferred from the distribution Sn{s)- One of the most important results of RMT is 
that eigenvalues usually "repel" each other in the spectrum |18) so that S{s) is zero at s = 0. 
In particular, for Gaussian random matrices, we have to a good approximation 

S{s) ^ se-^'/^ (21) 

which is known as the Wigner surmise. This level repulsion is characteristic for interacting 
systems, in which eigenvalues are correlated. For uncorrelated eigenvalues, the level-spacing 
distribution is exponential, S{s) = , for the unfolded spectrum, i.e., after transforming 
all eigenvalues such that their spectral density is uniform. Both the Wigner surmise and the 
exponential distribution are plotted in Fig. |3l right. 

It is evident from Eq. (I15|) that the mutation rate 7 plays the role of interaction strength, 
so we expect that S{s) should be exponential for 7^0, and that it will show signs of level 
repulsion for 7 > 0. This is indeed seen in Fig. 21 in which I plot S{s) obtained from 
numerical diagonalization of W for uniform fitness distribution and various mutation rates. 
A simple argument shows that for L large enough, 

S{s)^Nf{jL,Ns), (22) 

where f{g,x) is a (yet unknown) semi-positive function. This means that S{s) obtained for 
different lengths L and for mutation rates 'j = g/L with some arbitrary g should collapse 



^ Note that for p{(j)) = e which is often assumed by various authors [301 US] 1 max{<^i} ~ L. 
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Fig. 4. Level-spacing distribution S{s) for L = 7 and 7 = 0.005, 0.01, 0.02, 0.03, 0.05, 0.1 (curves from 
left to right). 



to a single curve when plotted in the rescaled variable x = Ns, i.e., when we "blow up" 
the spectrum of eigenvalues. This can indeed be seen in Fig. [5j The scaling form (j22p 
has a simple motivation. Firstly, the number of eigenvalues of the matrix W is A^. Since 
W = D + 7 A, for small 7 we expect that the spectrum of W will have a similar width as 
the spectrum of D. But the spectrum Pd{^) of the diagonal matrix D reads pnW = 
where p{(j)) = 1 for < (/> < 1, and it has a finite support of length one. Therefore, the 
average distance between the eigenvalues of W must scale as 1 /N, which explains the factor 
blowing up the nearest-level spacing in Eq. ()22p . 

Secondly, the net growth rate of genotype i is (pi — Lj + ^Ylj ^ijiT'j/n'i = (pi + 0{'jL), thus 
7 appears in the para-mu-se equations ()13p always as a product of L and 7. We thus expect 
that jL should be the relevant variable for the balance between growth and mutation. To 
make it more explicit, we observe that for small 7, the quasispecies is localized at the largest 
fitness, which for the uniform distribution p{4>) is (pi ~ 1. This best adapted genotype 
is surrounded by a sea of less adapted genotypes with average fitnesses (p2 ~ 1/2. The 
relative abundances of the best adapted genotype, xi, and a less-adapted one, say, X2, can 
be approximately determined from 

±1 (t) = XI [t] {(Pi - L-f) + L7X2 (t) , (23) 
i2(t) = X2it)(P2+7Xi{t). (24) 

Here I have assumed that mutations of the less-adapted genotype produce mainly genotypes 
from the same, less-adapted class of genotypes, therefore there is no loss term due to muta- 
tions in the second equation. Then, the difference of eigenvalues of the corresponding 2x2 
matrix W reads 

Ai - A2 = VL^7^ + (</'i - + 2L7(27 - (Pi + (P2), (25) 

and it is evident that (assuming that 2j <^ (pi — (p2 ^ 1/2) s = Ai — A2 depends only on the 
product of L7 in the limit of small mutation rate. 

The A^-scaling from Eq. (|22|) can be deduced analytically for uniform p{(p) in the limit of 
7 — )• 0. In this limit, as already mentioned, the eigenvalues of W are distributed uniformly 
between and 1. This means that, effectively, there are no interactions in the system, so 
S{s) for the unfolded spectrum should be exponential. This is very easy to check. The 
probability Sn{s) that the difference for an ordered pair of eigenvalues A„ > A„+i will be s 
is given by 

dXi dX2--- dXNp{Xi)---piXN)S{Xn+i- K- s), (26) 

00 ^ Ai J Xf 



JV-1 



where 



/OO POO /'OO 

dXi / dX2--- dXNpiXi)---p{XN). (27) 

-00 ^ Ai ^ A AT— 1 
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Fig. 5. Rescaled S{s) for L = 4, 5, 6, 7, 8, 9, 10, and 7 = g/L, where g = 0.04 (left), 0.2 (middle) and 
1 (right). For g = 0.04 some spikes are visible for s = 2jN. These spikes are finite-size effects, which 
vanish with increasing L. 
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Fig. 6. Rescaled distribution of the gap Si{s) between two largest eigenvalues, for L = 4, . . . , 10, 
7 = g/L, and g — 0.04 (left), g = 0.2 (middle) and g — 1.0 (right). The distribution shifts to the 
right for increasing L. 



Equations (p6|) and (j27p can be evaluated for the uniform fitness distribution. We obtain 
Z = l/m and 

Sn{s) = N{1 - s)^ « iVe-^^ (28) 

which does not depend on n, thus the average level-spacing is also S{s) = Ne^^'^. For large 
N, S{s) scales as in Eq. ([22]) with 7 = and 

fix) = e-\ (29) 

Such an exponential decay is indeed visible in Fig. [5l left. 

Numerical simulations presented in Fig. [5] indicate that the scaling form (j22p is valid 
also for 7 = g/L > 0, i.e., when 7 scales inversely with the length L of the sequence. For 
g small enough, the same scaling holds for the distribution 5*1(5) of the gap between two 
largest eigenvalues s = Ai — A2, see Fig. [6l left. However, for large g, the distributions 
shift to larger s with increasing L (Fig. [6] middle and right). Figure [7] shows plots of the 
participation ratio divided by the total number of genotypes, 

^^=^(E^M^ /E^M' (30) 

which measures how strongly the quasispecies is localized: PR ~ for eigenvectors with 
only few entries larger than zero, whereas PR ~ 1 means that all entries are roughly the 
same. In Fig. [71 PR ~ for small g, but for sufficiently large g, PR is of order one. This 
means that the principal eigenvector covers a finite fraction of the genotype space, hence the 
quasispecies is no longer localized. This indicates a transition similar to the error catastrophe 
in the quasispecies model. 
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Fig. 7. Participation ratio divided by = 2^ for g = 0.04 (solid), g = 0.2 (dashed) and g ~ 1.0 
(dotted). 



Using the scaling hypothesis for the level-spacing distribution which, as we have seen, is 
valid for any mutation rate 7 small enough (so that the quasispecies remains localized), the 
statistics of jumps can be inferred very easily. The distribution -Pjumps(''") of times between 
jumps is given by 



The same formula holds for Pss(t), the distribution of times to steady state, because the 
gap distribution 5*1(5) ~ S{s) for localized quasispecies. This means that the mean time to 
reach the steady state, which is the time it takes until the quasispecies stops evolving, is 



and it grows linearly with N (or exponentially with the length of the sequence L). For 7 = 0, 
the above integral is divergent, because f{'yL,0) > 0. This is correct, because in the absence 
of mutations there is no way to reach the steady state from any other point in the genotype 
space. But for 7 > 0, we have /(7L,0) = due to level repulsion. For small x, f{'^L,x) 
is proportional to x as it can be seen in Fig. [5j Therefore, ^f{'yL,x)dx < 00, and the 
average time (r) is finite. 

Equation ([3T]) tells us that for ^ r <C N/^, for which /(jLjx) ~ const, the distri- 
bution of times between jumps follows a power law: Pjumps(''") ^ t^"^- This is also true for 
the time to reach the steady state, Pss(''") ~ t""^ ■ In Fig- Ell show plots of the cumulative 
distribution of times between jumps measured directly by solving differential equations (|13p 
numerically for 5000 random fitness landscapes, and tracing the position of the maximal 
abundance nj. A jump was recorded whenever this maximal abundance changed its location 
in the genotype space. In this way, the statistics of times between jumps was obtained. For 
each fitness landscape, the simulation was stopped when the difference |n* — ni{t)\/N 
between the steady-state solution and ni{i) was smaller than 10~®. In this way, also the 
statistics of times to steady state was collected. The experimental cumulative distributions 
Cjumps(T) = Piumps{r')dT' and Css(t) = Pss{T')dT' presented in Fig. [8]show a power- 
law behaviour with an exponent close to minus one, Cjumps(''") ~ Css(''") ~ , in good 
agreement with theory. 

The power-law behaviour of -Pjumps(''") for individual jumps as well Pss(''") for the time to 
reach steady state has been observed in a simplified "shell" model in the strong selection limit 
[301 132| . which would correspond to 7 — )• in our model. We see that a simple observation, 
which relates the dynamics of the quasispecies to the level spacing distribution, allows one 
to extend this result to the case of 7 = i^/L > 0. 



6. Statistics of jumps 




(31) 




(32) 
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Fig. 8. Left: Plots of the cumulative distribution of jumps Cjuinps(T) = Piumps{T')dT' for L = 
4,5,6,7,8 (from left to right) and 7 — 0.04/i. Solid line corresponds to the theoretical result 
C(t) ^ T^-^. Right: Css(t) for the time to reach steady state. 

7. Conclusion 

The main objective of this work was to present an interesting analogy between the qua- 
sispecies theory (in particular, the para-mu-se model) and random matrices which appear 
in localization theory. I discussed how static and dynamical properties of the quasispecies 
model are related to quantities such as nearest-level spacing distribution or participation 
ratio of eigenvectors, which are typically calculated within the framework of random matrix 
theory. Although most of the results presented here were obtained in numerical simulations, 
they were all corroborated by simple, mathematical calculations. It remains a challenge to 
calculate analytically the level-spacing distribution for non-zero mutation rates. 

The analogy to Anderson localization mentioned here has been already mentioned in 
Refs. [HI |T6] and, more recently, in Ref. [9j, in which some results of localization theory 
for Id tight-binding models are used to find the point of the phase transition in a model 
with two quasispecies linked by migration. However, no systematic studies of localization 
on the hypercube with random distribution of fitness (site potential) have been made so far. 
This would be potentially a very interesting research area which could further link biological 
evolution models, localization theory, and random matrices. 
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